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Convergence and Fluctuations 
of Regularized Tyler Estimators 

Abla Kammoun, Remain Couillet, Frederic Pascal, Mohamed-Slim Alouini 


Abstract —This article studies the behavior of regularized Tyler 
estimators (RTEs) of scatter matrices. The key advantages of 
these estimators are twofold. First, they guarantee hy constrnc- 
tion a good conditioning of the estimate and second, being a 
derivative of robust Tyler estimators, they inherit their robustness 
properties, notably their resilience to the presence of outliers. 
Nevertheless, one major problem that poses the nse of RTEs in 
practice is represented by the question of setting the regulariza¬ 
tion parameter p. While a high value of p is likely to push all 
the eigenvalues away from zero, it comes at the cost of a larger 
bias with respect to the population covariance matrix. A deep 
understanding of the statistics of RTEs is essential to come up 
with appropriate choices for the regularization parameter. This is 
not an easy task and might be ont of reach, unless one considers 
asymptotic regimes wherein the number of observations n and/or 
their size N increase together. First asymptotic results have 
recently been obtained under the assumption that N and n are 
large and commensurable. Interestingly, no resnlts concerning the 
regime of n going to infinity with N fixed exist, even though the 
investigation of this assumption has usually predated the analysis 
of the most difficult N and n large case. This motivates our 
work. In particular, we prove in the present paper that the RTEs 
converge to a deterministic matrix when n ^ oo with N fixed, 
which is expressed as a function of the theoretical covariance 
matrix. We also derive the fluctuations of the RTEs around 
this deterministic matrix and establish that these fluctuations 
converge in distribution to a multivariate Gaussian distribution 
with zero mean and a covariance depending on the population 
covariance and the parameter p. 


I. Introduction 

The estimation of covariance matrices is at the heart of 
many applications in signal processing and wireless commu¬ 
nications. The frequently used estimator is the well-known 
sample covariance matrix (SCM). Its popularity owes to its 
low complexity and in general to a good understanding of 
its behavior. However, the use of the SCM in practice is 
hurdled by its poor performance when samples contain outliers 
or have an impulsive nature. This is especially the case 
of radar detection applications in which the noise is often 
modeled by heavy-tailed distributions 0-0- One of the 
reasons why the SCM performs poorly in such scenarios is 
that, as opposed to the case of Gaussian observations, the 
SCM is not the maximum likelihood estimator (MLE) of the 
covariance matrix. This is for instance the case of complex 

A. Kammoun and M.S. Alouini are with the Computer, Electrical, and 
Mathematical Sciences and Engineering (CEMSE) Division, KAUST, Thuwal, 
Makkah Province, Saudi Arabia (e-mail: abla.kammoun@kaust.edu.sa, 
slim, alouini @ kaust. edu. sa) 

R. Couillet and F. Pascal are with Laboratoire des Signaux et Systemes 
(L2S, UMR CNRS 8506) CentraleSupelec-CNRS-Universite Paris-Sud, 
91192 Gif-sur-Yvette, France (e-mail: romain.couillet@centralesupelec.fr, 
frederic.pascal@centralesupelec.fr) 

Couillet’s work is supported by the ERC MORE EC-120133 


elliptical distributions, originally introduced by Kelker Q and 
widely used in radar applications, for which the MLE takes a 
strikingly different form. 

In order to achieve better robustness against outliers, a class 
of covariance estimators termed robust estimators of scatter 
have been proposed by Huber, Hampel and Maronnaj[^-Q, 
and extended more recently to the complex case ||9l-pTf This 
class of estimators can be viewed as a generalization of MLEs, 
in that they are derived from the optimization of a meaningful 
cost function 1121, GD- Aside from robustness to the presence 
of outliers, a second feature whose importance should not be 
underestimated, is the conditioning of the covariance matrix 
estimate. This feature becomes all the more central when the 
quantity of interest coincides with the inverse of the popu¬ 
lation covariance matrix. In order to guarantee an acceptable 
conditioning, regularized robust-estimators, which find their 
roots in the diagonal loading technique due to Abramowitch 
and Carlson 0, 0, were proposed in p^ . The idea is to 
force by construction all the eigenvalues of the robust-scatter 
estimator to be greater than a regularization coefficient p. 

The most popular regularized estimators that are today re¬ 
ceiving increasing interest, are the regularized Tyler estimators 
(RTE), which correspond to regularized versions of the robust 
Tyler estimator 0. In addition to achieving the desired 
robustness property, RTEs present the advantage of being well- 
suited to scenarios where the number of observations is insuf¬ 
ficient or the population covariance matrix is ill-conditioned, 
while their non-regularized counterparts are ill-conditioned 
or even undefined if the number of observations n is less 
than their sizes N. Motivated by these interesting features, 
several works have recently considered the use of RTEs in 
radar detection applications 0, 0-0. While existence 
and uniqueness of the robust-scatter estimator seem to be 
sufficiently studied 0, 0, the impact of the regularization 
parameter on the behavior of the RTE has remained less 
understood. Answering this question is essential in order to 
come up with appropriate designs of the RTE in practice. It 
poses, however, major technical challenges, mainly because it 
necessitates a profound analysis of the behavior of the RTE 
estimator, which is far from being an easy task. As a matter 
of fact, the main difficulty towards studying the behavior of 
the RTE fundamentally lies in its non-linear relation to the 
observations, thus rendering the analysis for fixed n and N 
likely out of reach. In light of this observation, recent works 
have considered asymptotic regimes where n and/or N are 
allowed to grow to infinity. Two regimes can be distinguished; 
the regime of fixed N with n growing to infinity and the 
regime of n and N growing large simultaneously. While the 
former regime, coined the large-n regime, is standard in that 
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it was by far the most considered in the literature, the second 
one, which we will refer to as large-n, N regime, is very 
recent and is particularly driven by the recent advances in 
the spectral analysis of large dimensional random matrices. 
Interestingly, contrary to what one would imagine, very little 
on the behavior of RTE seems to be known in the standard 
regime, whereas very recent results regarding the behavior of 
RTE for the large-n, N regime have recently been obtained in 
eg. One major advantage of the large-n, N regime is 
that, although requiring the use of advanced tools from random 
matrix theory, it often leads to less involved results that let 
themselves to simple interpretation. This interesting feature 
fundamentally inheres in the double averaging effect that leads 
to more compact results in which only prevailing quantities 
remain. However, when N is not so large, the same averaging 
effect is no longer valid and thus cannot be leveraged. A priori, 
assuming that N is fixed entails major changes on the behavior 
of RTEs that have not thus far been grasped. Understanding 
what really happens in the large-n regime, besides its own 
theoretical interest, should lead to alternative results that might 
be more accurate for not so large-A^ scenarios. A second 
motivation behind working under the large-n regime is that 
covariance matrix estimators usually converge in this case to 
deterministic matrices, which opens up possibilities for easier 
handling of the RTE. Encouraged by these interesting practical 
and theoretical aspects, we study in this paper the asymptotic 
behavior of the RTE in the large-n regime. In particular, we 
prove in section |II] that the RTE converges to a deterministic 
matrix which depends on the theoretical covariance matrix and 
the regularization parameter before presenting its fluctuations 


around this asymptotic limit in section III Numerical results 


where G are Gaussian zero-mean random vectors with 
covariance I^r and S^r A 0 is the population covariance 
matrix. The regularized robust scatter estimator that will be 
considered in this work is that defined in as the unique 
solution C]s[{p) to: 


Cn{p) = (1-p) 


n ^ 


E t 


2 = 1 iV" 




(P)^: 


^+plN- ( 1 ) 


with p G (max(0,1—^), l]pl Obviously, Chen’s estimator is 
more involved and will not be thus considered in this work. 
Such an estimator can be thought of as a hybrid robust- 
shrinkage estimator reminding Tyler’s M-estimator of scale 
and Ledoit-Wolf’s shrinkage estimator | |22) . It will be 
coined thus Regularized-Tyler estimator (RTE), and defines a 
class of regularized-robust scatter estimators indexed by the 
regularization parameter p. When n > N, hy varying p from 


0 to 1, one can move from the unbiased Tyler-estimator 1231 


to the identity matrix (p = 1) which corresponds to a trivial 
estimate of the unknown covariance matrix S^r. 

A. Review of the results obtained in the large-n, N regime 

Letting CAf = f ’ the large-n, N regime will refer in the 
sequel to the one where n —>■ oo and N —i' oo with cn ^ c G 
(0,(X)). 

As mentioned earlier, unless considering particular assump¬ 
tions on Sjv, the RTE cannot be proven to converge (in 
any usual matrix norm) to some deterministic matrix in the 


large-n, regime. Eailing that, the approach pursued in |20| 


are Anally provided in order to support the accuracy of the 
derived results. 

Notation. In this paper, the following notations are used. 
Vectors are defined as column vectors and designated with 
bold lower case, while matrices are given in bold upper 
case. The norm notation ||.|| refers to the spectral norm for 
matrices and Euclidean norm for vectors while the norm 
ll-llFro refers to the Erobenius norm of matrices. Notations 
{.y (.)*, (.) denotes respectively transpose, Hermitian (i.e. 
complex conjugate transpose) and pointwise conjugate. Be¬ 
sides, In denotes the NxN identity matrix, for a matrix 
A, Amin (A) and Amax(A) denote respectively the smallest 
and largest eigenvalues of A, while notation vec(A) refers 
to the vector obtained by stacking the columns of A. For 
A, B two positive semi-definite matrices, A ^ B means 
that B —A is positive semi-definite. Xn = Op(l) implies the 
convergence in probability to zero of A„ as n goes to infinity 
and Xn = Op(\) implies that A„ is bounded in probability. 
The arrow designates almost sure convergence while 

T) 

the arrow"—refers to convergence in distribution. 

II. Convergence of the regularized M-estimator 

OF SCATTER MATRIX 

Consider xi, • • • ,x„, n observations of size N defined as: 


X, = 


consists in determining a random equivalent for the RTE, that 
corresponds to a standard matrix model. This finding is of 
utmost importance, since it allows one to replace the RTE, 
whose direct analysis is overly difficult, by another random 
object, for which an important load of results is available. 
The meaning of the equivalence between the RTE and the 
new object will be specified below. 

Prior to presenting the results of pO) , we shall, for the 
reader convenience, gather all the observations’ properties in 
the following assumption: 


Assumption A-1. For i G {1, • 


i}, X, = S 


N 


w,-, with: 


• Wi, • • ■ , w„ are A x 1 independent Gaussian random 
vectors with zero mean and covariance In, 

• Sat G A 0 is such that A tr = 1- 

It is worth noticing that the normalization ^ tr Sjv = 1 is 
considered for ease of exposition and is not limiting since the 
RTE is invariant to any scaling of Sjy. Denote by Sn{p) the 
matrix given by: 

^n{p) = - EA - ■-'^W,W*+pIn, 

lN{p) l-{l-p)CN n ^ 

^Another concurrent RTE is that of Chen {et al 0 which is given as the 
unique solution of 

1 


Cjv(p) = Bjv(p)— trBjv(p) 


where 


1 

Biv(p) = (1-p)- t: 

r) f ^ 




—j-l-pljv- 
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where "fN{p) is the unique positive solution to; 

1 = (p7Ar(p) + (l-p)SAr)"^ 

then Sn{p) is equivalent to the RTE Cjv(p) in the sense of 
the following theorem, 

Theorem 1. For any k > 0 small, define = 

[K+max(0,1 —c“^), l]. Then, as N,n^ oo with c € 

( 0 , 00 ) and assuming limsup |jSAr|| < 00 , we have: 


sup 


Cn{p)-S 


N 


0 . 


B. Convergence of the RTE in the large-n regime 

In this section, we will consider the regime wherein N 
is fixed and n tends to infinity. An illustrative tool that is 
frequently used to handle this regime is the strong law of 
large numbers (SLLN) which suggests replacing the average 
of independent and identically distributed random variables by 
their expected value. This result should particularly serve to 
treat the term 


in the expression of the RTE. Nevertheless, because of the 
dependence of Cjv(p) on the observations x^, the SLLN 
cannot be directly applied to handle the above quantity. As 
we expect Cn{p) to converge to some deterministic matrix, 
say Snip), it is sensible to substitute - , — by 

^ " ^2=1 x*Cjy7p)Xi ^ 


1 . 

to E 


XiX • 


t X*S, 


-. The latter quantity is in turn equivalent 


0 iP)^i 

from the SLLN where the expectation is 


x*So ^(p)x 


taken over the distribution of the random vectors x^. Based 
on these heuristic arguments, a plausible guess is that Cn{p) 
converges to So(p)j the solution to the following equation: 


So(p) =iV(l-p)E 


XX 


.x*So^(p)x 


+ plAr- 


( 2 ) 


The main goal of this section is to establish the convergence 
of Cn{p) to So(p). We will assume that So(p) exists for 
each p G (0,1]. The existence and uniqueness of So(p) will 
be discussed later on in this section. Similar to the large- 
n, N regime, we need to introduce a random equivalent for 
Cjv(p) that is easier to handle. Naturally, an intuitive random 
equivalent is obtained by replacing, in the right-hand side of 
Q, Cn {p) by So(/o). thus yielding: 




X,;X: 




-1 




TpIat- 


(3) 


Unlike Cm{p), ^{p) is more tractable, being an explicit 
function of the observations’ vectors. By the SLLN, S(p) is 
an unbiased estimate of So(p) that satisfies: 

^o(p) = S(p) + e„(p), 


where e„(p) is an NxN matrix whose elements converge 
almost surely to zero and are bounded in probability at the 
rate i.e, 

[‘-(PI.,, = Op (;) ■ 

Lor the above convergence to hold uniformly in p, one needs 
to check that the first absolute second moment of the entries 
of is uniformly bounded in p. To this end we shall 

^*^0 (p)^ 

additionally assume that; 

Assumption A-2. 

Matrix is non-singular, i.e., the smallest eigenvalue of 
'^min(^A:') Satisfies. 

Amin(S^f) > 0. 


Under Assumption the spectral norm of Sg (p) can be 
bounded as: 


Lemma 2. Let Sg be the solution to •HI, whenever it exists. 
Then, 


sup ||So(p)|| < 

pe[K,i] 


IISatII 

Amin(5jAf) 


where k > Q is some positive scalar. 

Proof: See Appendix [A| ■ 

Equipped with the bound provided by Lemma we can 
claim that: 

sup [en{p)]ij =Op 
pG[«.l] 


or equivalently: 


sup 

PG[k,1] 


S(p)-Sg(p) 



Characterizing the rate of convergence of S(p) to Sg(p) is of 
fundamental importance and would later help in the derivation 
of the second-order statistics for S(p) and then for C]s[{p). 

Before stating our first main result, we would like to 
particularly stress the fact that Assumptionj^is not limiting. To 
see that, consider Sat = UAU* the eigenvalue decomposition 
of Sat wherein the diagonal elements of A, Ai, • • • , Aat cor¬ 
respond to the eigenvalues of SAf arranged in the decreasing 
order, i.e., Ai > A 2 • • • > Aap. Denoting by r the rank of Sa^p 
then, Ar+i = • • • = Aat = 0. Write U as U = [Ur,U 7 v-r], 
Uj. G Then, it is easy to see that; 

CN{p)'^N-r = pUw-r 


while; 


1 

u:c^(p)u, = (i-p)-^ 


Ar.2W,-W*Ar 


Tl " i A 

i=l 




■ +plAf) 


(4) 


where = U*Xi follows a Gaussian distribution with 
zero-mean and covariance I^.. Since ^U*Ca/ (p)U.) = 

U*C^^(p)Ur, instead of using Cj^{p), it thus suffices to 
work with U*CA/(p)Ur, for which Assumption|^can be used. 
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The following theorem establishes the convergence of 
Cjv(p) to So(p), the hypothetical solution to (|^, 

Theorem 3. Assume that there exists a unique solution So(p) 
to Let K > 0 be a some small positive real scalar. Then, 
assuming that Assumptions and hold true, one has under 
the large-n regime: 


sup 

PG[k,1] 


C^(p)-So(p) 


0 . 


Moreover, 


sup 

PG[k,1] 


CAr(p)-So(p) 



To this end, consider 


h : 


cN 




{xi, ■ ■ ■ ,xn) N{l-p) E 




N 1 I 12 


E iV 1 




E 


WAr| 


E iV 1 I 

3=1 T-m 


— ■ 

Aat 


Proving that di,--- ,dN are the unique solutions of 0 is 
equivalent to showing that; 


x= h{xi,--- ,xn) 


( 8 ) 


Proof: See Appendix [B] ■ 

In Theorem we establish the convergence of Cn{p) 
to some limiting matrix So(p) that solves the fixed point 
equation Q. While ([^ seems to fully characterize So(p), it 
does not clearly unveil its relationship with the observations’ 
covariance matrix The major intricacy stems from the 
expectation operator in the term E 


— , . A close look 

[x*So (p)xj 

to this quantity reveals that it can be further developed by 
leveraging some interesting features of Gaussian distributed 
vectors. Note first that (|^ is also equivalent to: 


W(l-p)E 






N 


^So(p)S 


N 


(5) 

where w ~ CN'{0,1n)- Let = VDV* be an 

eigenvalue decomposition of where D is a 

diagonal matrix with diagonal elements di,d 2 , - ■ ■ ,dN. Notice 
that, of course the dfs depend on p. However, for simplicity 
purposes, the notation with (p) is omitted. Since the Gaussian 
distribution is invariant under unitary transformation, 0 is 
also equivalent to: 


iV(l-p)E 


w*Dw 




( 6 ) 


It is not difficult to see that the off-diagonal elements of 
E are equal to zero. In effect for i f j, writing 


w*Dw 

Wi as with Rayleigh distributed and 0i indepen¬ 

dent of ri and uniformly distributed over [—7r,7r], one has 


= E 


which can be shown 


taking the expectation over the difference of 
is diagonal, with diagonal 


Dw 


\W^, 


|2 1 


E 

w* JJw 

to ne zero by 
phase 0i — 9j. Therefore, E 
elements given by: 

ai(D)=E 

w*Dw 

Hence, V*SjY^V is also diagonal, thus implying that S^r and 
So(p) share the same eigenvector matrix U. In order to prove 
the existence of So(p), it suffices to check that di, • • • ,dN 
are solutions to the following equation: 


iV(l-pK(D) + ^ = l. 


(7) 


admits a unique positive solution. For this, we show that h 
satisfies the following properties: 

• Nonnegativity: For each xi, - ■ ■ ,xn > 0, vector 

h{xi, - ■ ■ jtcjvjhas positive elements. 

• Monotonicity: For each xi > Xi, - ■ ■ ,xn > 

h{xi,---,XN) > h{xi,-■ ■ ,Xf^) where > holds 

element-wise. 

• Scalability: For each a > 1, ah{xi,--- ,Xjv) > 

h{axi, ■ ■ ■ , uxn)- 

The first item is trivial. The second one follows from the fact 
that h is an increasing function of each Xi. As for the last 
item, it follows by noticing that as p > 0, 


E 


1 |2 

/ 

1 |2 

\Wi\ 

+ -^ < a E 

\Wi\ 



A, V 


A, 


According to |24|, h is a standard interference function, 
and if there exists gi, • • • ,qN such that q > h{qi, ■ ■ ■ , qj^) 
where > holds element-wise, then there is a unique = 
(xi ) * * * 7 Xjv^oo) such that. 


^OO L(xi 007 ‘ ‘ ‘ 7 X7V,Oo)' 

Moreover, Xqo = lim(_ 7 ,oo with x*^°^ > 0 arbitrary and for 
f > 0, x(*+^) , a; To prove the feasibility con¬ 

dition, take q = (g, • • • ,g). Then, h[q, • • • ,g) = (1 —p)g-l-^. 
Setting g > ^ve get that h{q, • • • , g) < q, thereby 

'Amin 

establishing the desired inequality. 

The interest of the framework of Yates | |2^ is that in addi¬ 
tion to being a useful tool for proving existence and uniqueness 
of the fixed-point of a standard interference function, it shows 
that the solution can be numerically approximated by com¬ 
puting iteratively = h{x\, ■ ■ ■ ,x^). However, in order 

to implement this algorithm, one needs to further develop the 
terms ai(D). This is in particular the goal of the following 
lemma, the proof of which is deferred to Appendix [C] 

Lemma 4. Let w = [uii,-- - ,wnY be a standard complex 
Gaussian vector and D = {di, • • • , d^) be a diagonal matrix 
with positive diagonal elements. Consider ai, ■ ■ ■ , ajv, the set 
of scalars given by: 


ai(D) = E 
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Then 


a^{-D) = 


( 


1 1 


XF; 


(TV) 


D 


T di 

i-th 

\ position 


d\ 2 2 


An 




where ^ ii the Lauricella ’s type D hypergeometric func¬ 
tion. 0 


Equipped with the result of Lemma we will now show 
how one can in practice approximate So(p)- First, one needs 
to approximate the solution of (|^. Let d° = d^i \ • • • , d!'^ 

be an arbitrary vector with positive elements. We set = 

••• 

“l I ! “at 


* ^+7V(l-p)ai(diag(d(*))) 

where the expression of Q;i(diag(d(‘))) is given by Lemma 

4 As t ^ oo, d*^*^ tends to d, the vector of eigenval- 

‘—'1,1 

ues of (p)S^ which is the solution of ([^. Since 

Sat and So(p) share the same eigenvectors, the eigenvalues 
si,oo, • • • , Sat ,00 of So(/3) are given by s^^oo = The matrix 
So(p) is finally given by; 


So(p) = Udiag([si,oo, • • • , stv.oo])U*. 

While the above characterization of So(p) seems to provide 
few insights in most cases, it shows that except for the 
particular case of Sat = Iw, the RTE C]sf{p) is biased for 
p S [k, 1) in that; 

^o(p) 7^ ^N- 


methods using RTFs, it becomes of little help when one is 
required to deeply understand their fluctuations, a prerequisite 
that essentially arises in many detection applications. This 
motivates the present section which aims at establishing a 
Central Limit Theorem (CLT) for the RTE. 

It is worth noticing that the scope of applicability of the 
results obtained in the large-n regime is much wider than 
that of the n, N large regime. As a matter of fact, using the 
Delta Method p5) , our result can help obtain the CLT for any 
continuous functional of the RTE. We deeply believe that this 
can facilitate the design of inference methods using RTFs. 

Although treatments of both regimes seem to take different 
directions, they have thus far presented the common denom¬ 
inator of relying on an intermediate random equivalent for 
Cn{p), be it S(p) or SAr(p) (See Theorem[^. It is thus easy 
to convince oneself that in order to derive the CLT for CAr(p), 
a CLT for S(p) is required. 

We denote in the sequel by 6 and S the quantities; 6 = 
vec(CAr(p)) —vec(So) and 6 — vec(S(p)) —vec(So) and 
consider the derivation of the CLT for vectors S and then 
for 6. We will particularly prove that d and S behave in 
the large-n regime as Gaussian random variables that can 
be fully characterized by their covariance matrices E [J^*] 
and E[J^ ]. Starting with the observation that in many signal 
processing applications, the focus might be put on the second- 
order statistics of the real and imaginary parts of 6 and 6, 
we additionally provide expressions for the pseudo-covariance 
matrices E and E[^J ] of S and 6 which, coupled 
with that of covariance matrices, suffice to fully characterize 
fluctuations of the vectors and [3?^ ,5^ ]^. 

We will start by handling the fluctuations of d. To this end, 
we need first to work out the expression of S(p). Recall that 
S(p) is given by; 


To see that, notice that So(p) = '^n implies that D = Iat. 
Replacing D by the identity matrix in (|^ and using the fact 
= ;^lAf shows that only Sa^ = Iat satisfies a 


that E 

null bias. Hence, it appears that improving the conditioning of 
the RTE by using a non-zero regularization coefficient comes 
in general at the cost of a higher bias. 


III. Second order statistics in the large-ti regime 

The previous section establishes the convergence of the RTE 
to the limiting deterministic matrix So(p)- In the following, 
for readability purposes, So(p) will be simply replaced by 
So- The convergence holds in the almost sure sense, and 
can help infer the asymptotic limit of any functional of the 
RTE. More formally, for any functional / continuous around 
So, /(Cat) converges almost surely to /(So). While this 
result can be used to understand the convergence of inference 

^The evaluation of the Lauricella’s type D hypergeometric function is 
performed numerically using its integral representation 

F^\a,bi, - ■ ■ , c; * 1 , ■ ■ ■ ,x„) 

= -5Rc> 5Ra > 0. 

r(a)r(a-c) Jo fA 


± 

n 


i=l 


\ -1" pi AT ■ 


Therefore, 




i-1 'W*S^Sg S/^Wi 


“bp^o ~ItV 


Using the eigenvalue decomposition of S|^Sg ^S^ = 

UDU* and denoting = U*Wi, we thus obtain; 


— 1~ —1 A^fl — p) E)2W,W*D2 


w*Dw,' 


+pU*So^U-lAr. 


From the characterization of So provided in the previous 
section, we can easily check that; 


N{l-p)E 


D 2 ww*D2 
w*Dw 


= I^-pU*So'U 
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Therefore, 

(9) 


( 10 ) 

From ©■ it appears that the asymptotic distribution of 
is Gaussian and thus can be fully characterized 
by its asymptotic covariance and pseudo-covariance matrices. 
Using ( [Tol l, it is easy to show that we need for that the pseudo¬ 
covariance and covariance matrices of: 

1 vec(wiW*) 'vec(ww*)' 

n ^ w*Dwi w*Dw 

i — \ * -■ 

These quantities involve the following set of scalars. 




iV(l-p) 


E 

2=1 


U-I^ 

D2WiW*D5 
w* Dw, 


-E 


D 2 ww*D 2 
w*Dw 


Aj = E 


(w*Dw) 


2 


2 


j = 1, 




for which closed-form expressions need to be derived. This is 
the objective of the following technical lemma, which is of 
independent interest: 

Lemma 5. Let w = [tui,’’' he a standard complex 

Gaussian vector and D = diag(c?i, • • • be a diagonal 

matrix with positive diagonal elements. Consider fdij as 
above. Then jdij are given for i = j and i j by the 
expressions in ([n]), ([T^ and ([T3) at the top of next page. 

With this result at hand, the next Lemma follows immedi¬ 
ately: 


Lemma 6. Let D be N x N diagonal matrix with positive 
diagonal elements. Consider Wi, • • • , w„ n independent com¬ 
plex Gaussian random vectors with zero-mean and covariance 

I 'T-’j I — ( \ vecfwiWi) ttt' vecfww*) \ 

N. Then, yLn [ wdw JJ converges 

to a multivariate Gaussian distribution with covariance B(D) 
and pseudo-covariance G(D) given by: 


B(D) = B(D)-vec(H)vec(H)" (14) 

G(D) = G(D)-vec(H)vec(H)'' (15) 


where 


B(D) = E (vec(ww*)))* 

G(D) =E 

H(D) = diag (ai(D), • • • , Q;Ar(D)) 


(w*Dw)2 

vec(ww*) (vec(ww*)))^ 
(w*Dw)2 


N matrices, i.e, B(D) = 


composed of 

Bi,i 

By 

Bat^ • ■ 

Bat, 


G(D) = 


Gyi 


Gyiv' 

where: 

Giv.i 


Gat,AT. 



Byi = diag(/3i,i • • • ,/?i,Ar) 


Bi j- 

k,i 



= ^{k=i,£=j}Pi,j~^^{k^j,i 


Equipped with Lemma [^ we are now in position to state 
the CLT for S(p), whose proof is omitted being a direct 
consequence of Lemma [^ 

Theorem 7. Let S(p) be given by wherein observations 
Xi, • • • , x„ are drawn according to Assumption [^ Consider 
Sat = UAtvU* the eigenvalue decomposition ofYlM- Denote 
by D the diagonal matrix whose diagonal elements are solu¬ 
tions to the system of equations 0- Then, in the asymptotic 
large-n regime, yCiS = -^/n ^vec(S(p)) —vec(So)^ behaves 
as a zero-mean Gaussian distributed vector with covariance: 

Ml = iV2(l-p)2 B(D) (aJu^oaJu*) 

and pseudo-covariance: 

M2 = N^i-pf (uaJ®ua|) g(d) (a|u*®a|u^) . 

where B(D) and G(D) are given by d and d of 
Lemma |6] 

Now that the fluctuations of S(p) have been determined, 
we are in position to derive the asymptotic distribution of 
vec(CAr(/9)). The very recent results in establishing 

equality between the fluctuations of the bilinear-forms of 
Cjv {p) and those of its random equivalent Sjv (p) in the large- 
n, N regime might lead us to expect similar results to hold in 
the large-n regime. As we will show in the following theorem, 
contrary to these first intuitions, the asymptotic distribution of 
vec(S(p)) is different from that of vec(CAr(p)), even though 
it plays a central role in facilitating its analytical derivation. 

Theorem 8. Under the same setting of Theorem define F 
the X matrix: 

F = A(l-p) (uD5(g)UD5^ B(D) 

with B(D) defined in Lemma Consider Cm{p) the robust 
scatter estimator in ©• Then, in the large-n asymptotic 
regime, ^/nS = ^vec(CAr(p)) —vec(I]o) j behaves as a 
zero-mean Gaussian-distributed vector with covariance: 

Ml = Ml 

and pseudo-covariance: 

M2 = (Iiv=-F)-' M2 
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Pi,i oAT-l 


N 


N , 


2^-^N[N+l)dlY{^^^du 

1 1 
2^N{N+l) dd^Yltidk 

P'^,3 ~ ^ ^ J 


di — h dj^ — 

iV,l... ,1, 3 ,1,... ,l,iV+2,^ 

T 

2-th 

^ position 




djv 


( 11 ) 


( 


Pi,j ~ 


F, 


N 


di — \ dff~' 


2 ,1,---,1, 2 ,1--- , 


V 


T 

2-th 

position 


T 

j-lh 

position 


di 


J'N 


<J 


( 12 ) 

(13) 


Proof: The proof is deferred to Appendix ■ 

TV. Numerical results 

In all our simulations, we consider the case where 
Xi, • • • , x„ are independent zero-mean Gaussian random vec¬ 
tors with covariance matrix S^r of Toeplitz form; 

= I i>j’ 


A. Which regime is expected to be more accurate 


In order to study the behavior of RTE, assumptions letting 
the number of observations and/or their sizes increase to 
infinity are essential for tractability. The behavior of RTE 
is studied under both concurrent asymptotic regimes, namely 
the large-n regime, which underlies all the derivations of 
this paper, and the n, N-large regime recently considered in 
p0| . Given that the scope of the results derived in the large- 
n, N regime, has thus far been limited to the handling of 
bilinear forms, practitioners might wonder to know whether, 
for their specific scenario, further investigation of this regime 
would produce more accurate results. In this first experiment, 
we attempt to answer to this open question by noticing that 
both regimes have the common denominator of producing 
random matrices that act as equivalents to the robust-scatter 
estimator. The accuracy of each regime is thus evaluated by 
measuring the closeness of the robust-scatter estimator to its 
random equivalent proposed by each regime. This closeness 
is measured using the following metrics: 


and 


f 4 _K 

N 


C(p)-S(p) 


2 

Fro 


£n,N = 


N 


C(p)-Sjv(p) 


2 

Fro 


Eigures [T] and represent these metrics with respect to 
the ratio ^ when TV = 4,16,32, 6 = 0.7 and p set to 
0.5. The region over which the use of the large-n regime is 
recommended corresponds to the values of ^ for which the 
£n curve is below the £n,N one. 

Erom these figures, it appears that, as N increases, the 
region over which results derived under the large-n regime 
are more accurate, corresponds to larger values of the ratio 

n 

N- 



Fig. 1. Accuracy of the random equivalent when = 4 



N 

Fig. 2. Accuracy of the random equivalent when = 16 












































N 



b 


Fig. 3. Accuracy of the random equivalent when TV = 32 


Fig. 4. Analysis of the bias for n = 1000 and TV = 2 with respect to b and 
for different values of p. 


B. Asymptotic bias 

In this section, we assess the bias of the RTE with respect to 
the population covariance matrix. Since in many applications 
in radar detection, we only need to estimate the covariance 
matrix up to a scale factor, we dehne the bias as; 


Bias = 


E 


N 


tr 




J-AT 


Fro 


N 


has a bounded spectral norm, the 


dominated convergence theorem implies that: 


Bias- 

n—>-+00 


1 

7 

1_! 





Figure displays the asymptotic and empirical bias with 
respect to the Toeplitz coefficient b and for p = 0.2, 0.5,0.9. 
We note that the bias is an increasing function of b. This is 
expected since for small values of b, the covariance matrix 
becomes close to the identity matrix. The RTE, viewed as a 
shrunk version of the Tyler to the identity matrix will thus 
produce small values of bias. 


C. Central Limit Theorem 

The central limit theorem provided in this paper can help de¬ 
termine fluctuations of any continuous functional of vec(CAr). 
As an application, we consider in this section the quadratic 
form of type ip*C^^(p)p with ||p|| = 1 (used for instance 
for detection in array processing problems p^), for which the 
large-n and the large-n, N regimes predict different kind of 
fluctuations. As a matter of fact, applying the Delta Method 
| [25| , one can easily prove that under the large-n, 

j 

((Soi)-p®So-'p)*Mi ((Sq-I) -p®So-ip) 

Aaa(o,i). 


On the other hand, using results from p0[ , one can prove that 
under the large-n, W regime, ^p*C^^p)p satisfies; 

Tn,N = (^^P*C];,Hp)p-;^P*Qtv(p)p^ ^ A/'(0, 1) 

where: 

2 _ w(-p)^(l-p)^ (^p*SArQ^p)^ 


with p, m{—p) and Q(p) have the same expressions as in 
1201 when C^v in pOt is replaced by Sjy. A natural question 
that arises is which of the two competing results is the most 
reliable for a particular set of values N and n. To answer 
this question, we plot in figures and the Kolmogorov- 
Smirnov distance, between the empirical distribution function 
of Tn and obtained over 50 000 realizations, and the 
standard normal distribution with respect to the ratio ^ when 
b = 0.7j, p = 0.5, p = [1, • • • , 1] and for = 4,16, 32. We 
note that for values of N up to 16, results derived under the 
large-n regime are more accurate for a large range of n while 
the use of the results from the large-n, N regime seems to be 
recommended for N = 32. 


V. Conclusions 

This paper focuses on the statistical behavior of the RTE. It 
is worth noticing that despite the popularity of the RTE, char¬ 
acterizing its statistical properties has remained unclear until 
the work in p0| shedding light on its behavior when the large- 
n, N regime is considered (the number of observations n and 
their size N growing simultaneously to infinity.). Interestingly, 
no results were provided for the standard large-n regime in 
which N is fixed while n goes to infinity. This has motivated 
our work. In particular, we established in this paper that the 
RTE converges, under the large-n regime, to a deterministic 
matrix which differs as expected from the true population 
covariance matrix. An important feature of this results is that 
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n 

N 


Fig. 5. Analysis of the accuracy of the CLT results for TV = 4 



n 

N 


Fig. 6. Analysis of the accuracy of the CLT results for TV = 16 



TV 


it allows for the computation of the asymptotic bias incurred 
by the use of the RTE. We also studied the fluctuations of 
the RTE around its limit and prove that they converge to 
a multivariate Gaussian distribution with zero mean and a 
covariance matrix depending on the true population covariance 
and the regularization parameter. The characterization of these 
fluctuations are paramount to applications of radar detection 
in which RTEs are used. Einally, numerical simulations were 
carried out in order to validate the theoretical results and also 
to assess their accuracy with their counterparts obtained under 
the large-n, N regime. 


Appendix A 
Proof of Lemma[2] 

In the following appendices, for readability purposes, the 
notation So(p) (resp. S(/o)) is simply replaced by Sq (resp. 
S). Of course, the dependence of Sq to p is not omitted. 

Multiplying both sides of Q by we show that Sg 

satisfies: 


(l-p)E 


ww 




w 




■'O'^AT ) 


L at " ^ N ' 

where w is zero-mean distributed with covariance matrix Iat. 


Define A 


Then, 


A = (l-p)E 


WW 


^w*A~iw 




which yields the following bound for ||A||, 


Hence, 


|A|| <(l-p)||A|| + 


lAII < 


Amin(SAf) 




(17) 


Now, ||A|| can be lower-bounded by: 

||A|| = max 
l|x||=l 

> ||So|| max x*S^^uu*S^^x 

l|x|| = l 

> |lSo||u*S^=uu*S^^u 
^ IISqII 

- IISnII’ 


(18) 


where in (a) u is the eigenvector corresponding to the max¬ 
imum eigenvalue of Sg. Combining (|T7]i and ([TSll, we thus 


obtain: 


l|So||< 


IISnII 

Amin(SAf) 


Appendix B 
Prooe of Theorem[3] 

The proof is based on controlling the random elements di (p) 
given by: 


di{p) 


x*CV(p)x,-x*Sg^x, 

\/x-Soixi-yx*C^\p)x, 


Fig. 7. Analysis of the accuracy of the CLT results for TV = 32 
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Recall that, by the SLLN, under the large-n regime, Sq 
satisfies: 


So = fV(l-p)-^ 

n. 




+ plAf +en(p), 




— x^- 


l_p " x,x* (ix*S/x,-ix*C^i(p)x. 




^x*So x,ix*C^i(p)x, 


+e„ 


^0 Xj 


l~Pv^ x*CjY (p)xiX*Sg ■x.jdi{p) 


1-1 


]vXtC^^(p)xi^x*Soix, 


N-^i^O Xj 


+x CjY (p)e„SQ Xj. 


Hence, 


1 —P ^7^iV 


dj(p) = 


^i=l 


di(p) 


AT ^2 ^0 N-' 


+ 


/x*Cjv (p)xjX*So Xj 
XjCjY {p)^n^o Xj 




XjX*So^Xj 


c(p) < 


c(p) 


\/x*Cjy1(p)xjX*So ^Xj 


A 


1 —p (p)XiXj CjY {p)^j 

^ hi ^x*C^i(p)x, 


\ 


1 —P^X*Sg XiX*SQ Xj 


■E' 


Z=1 

+ I1^W^ (p)^n^C 


1 -.f* X 7 1 -y . 

AfXj^O Xi 


Therefore, 

^max(p) ^ 


^inax(p) 


:yx*C-5 (l^-pC^Ap)) CA: 


where e„ is a x iV matrix whose elements converge almost 
surely to zero and satisfy [er!,(p)]i_j = Op{^). 

In the sequel, we prove that for any k > 0, 


sup max \di{p)\ 0. 

p6[„,l]l<*<n' 


For that, we need to work out the differences x*C^^(p)xi — 
x*Sg ^Xi for f = 1, • • • ,n. Using the resolvent identity A^^ — 
(B —A) B~^ for any NxN invertible matrices, 

we obtain: 


XYX*Sg^ (IaT pSg )Sg^Xj X* S g e„Sg Xj 

+ ||C-5(p)e„Sg-^|l. 

Using the relation |x*Ay| < ||x||||A|||jy|j, we thus obtain: 

^max (p) < dmax (p) P^N (^*)ll 

/ Y*vi-ic 

+ ||C^Mp)6„Sg"||. 


||lAl-pSg-^||- 


X*Sg Xj 


Since sup^g[„_i) ||C^A„(p)So = || < ^ sup^gj^^i) |le„(p)|| 
and using the fact that Pat —pCY^^(p)|| < 1, we get: 

^niax (p)< ^niax (p) IjV — pSg-'ll 


Sg ^ e„Sg 


s'l'- 


Again, as ||Sg ^e^Sg ^ || < we have: 


dmax(p) ( l-VllI^-pSo-'||-^/^||e„|| ) < ^||e„||. 


From Lemma 1^ ll^oll < Therefore, for n large 

enough (say large enough for the left-hand parenthesis to be 
greater than zero), 

iii^ II 

O^max(p) ^ 


Let c?max(p) = maxi<j<„ |dj(p)|. By the Cauchy-Schwartz 
inequality, we thus obtain: 


i-\AA^T®F-\/iiM' 

Taking the supremum over p G [k, 1), we finally obtain: 

sup dmax(p) < -==AI^II=- 


thereby showing that dnia^{p) 0 and d^a^p) = Op {^) 
Now, that the control of (imax(p} is performed, we are in 
position to handle the difference CAr(p) —Sg. We have: 




^ ibl X*C^ (p)Xi ^X*Sg Xi 
-^n{p) 

1-p' 


-XiX*d,{p) 


T. 

*=i \/xX?CArPp)x,A/ix*So"xi 


-en{p)- 
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Therefore, 

llC^(p)-So| 

^max(p) 


l-p 


E 


* — 1 \ AT 


nKCn\p)^> 


J^^xcy(-1 


+ lkn(p)||. 

By the Cauchy-Schwartz inequality, we get: 

l-p ■ " 


Appendix D 
Proof of LemmaO 

Again the proof of the results in Lemma [^follows the same 
lines as in Appendix We will only detail the derivations 
for the expressions of /3i i, i = 1, • • • ,N. The same kind of 
calculations can be used to derive that of ^ j. Using 

the relation ^ we write = E 

as: 




(w*Dw)2 


|CAr(p)-So|| < d max ip) 

l-p 


Ex 




i=i 


Kip)\\ 


Pi,i = E 


Wi 




poo poo 


/o ^0 


“'“Mexp(-'u/2) 


N 


N 


or equivalently: 


;exp j —t Ujdj e ■ ■ ■ duN-idudt 

3 = i,j=Ai 


i=l 

N 


|C7v(p) —Soli < dmax(p) 

+ l|e„(p)|l. 


Cat- pi 


■N 


|So —pIat — 


Hr 


-dt. 


2^ ^ Jo i^+td,)^ t=i 

Conducting the change of variable t = - — 1, we obtain: 


Since d„iax(p) 0, to conclude, we need to check that the 
spectral norm of Cat is almost surely bounded. The proof 
is almost the same as that proposed in Lemma to control 
the spectral norm of Sq with the slight difference that the 
expectation operator is replaced by the empirical average, 
and using additionally the fact that ^ 

Details are thus omitted. 


I3i,i = 


{l — v)v" ^dv 


2N-1 




1 -- 


v{di-^)\ -p-rA ,v{^-dk) 


ULiC- 


dk 


-+i) 


N- 


N- 


Appendix C 
Proof of Lemma|4] 

The proof of Lemma is based on the same technique 
in 127 . Using the relation - = /q e °‘*dt, we write j-jjg ^ 5 g Qf ^jjg Slutsky Theorem which allows us to 


Appendix E 
Prooe of TheoremE] 

Our approach is based on a perturbation analysis of 
vec(CA(p)) in the vicinity of the asymptotic limit Sq coupled 


E 


IX 

w*Dw 


as: 


discard terms converging to zero in probability. 


Set A = Sn 


E 


= E 

p + OO 


w*Dw 


Jo 


(Ca(p)- 


— So I Sn ^. Then, 


p + OO /* + oo 


/o ^0 


2N 


e ‘‘^’“uexp(—u/2) 


N 


p+OO 


p + OO 


iV(l-p)^So^x,x:SX , ^-1 . 


X exp 


-t ^idA P e-'^^^‘^dui---duN-idudt Writing Cjv as: 


1 1 


N 

n 


lo 2^ (i+fdi) \ d-tdj 


dt. 


CA = Ca-So + So 


-A)-'So^ 


Conducting the change of variable f = i —1, we eventually 
obtain: 


= So ^ (Ia- 
= So-'-So-^ASo-Eop(|lA||) 


E 


w, 


|2 1 


w*Dw 


1 


„A-1 


0 ^""d^iriid.il-v^-^)-!- 




di-i 


A 

n 


we obtain: 


1 


dj — 


jrdv. 


A = 


Nil-p) 


E 


Sg ^ XiXj Sg 


We finally end the proof by using the integral representation 
of the Lauricella’s type D hypergeometric function. 


j=i x*So Xi-x*So ASg ""Xj+OpdlAI 
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From p5l Lemma 2.12], A writes finally as: 

X*So "" ASn 


^ jV(l-p) ^ Sq ^x,x*So 


+ ^ “Iw + Op 




1 + 




= So"SSo"-lAr 


7V(l-p) So ^x,x*So ^x*So ^ASo ^x. 


■E 

2=1 


«So-ixO^ 


= So=SSo"-lAr 


fV(l-p) ^0 

r7 


^ (xf(So") 


T.c^Y* V 

-^0 


vec(A) 


We will now provide a closed-form expression for E(F). 
To this end, we will use the eigenvalue decomposition of 
Sg ^S^ = UDsU*. Then, letting w = U*w with w = 
SjY^x, we obtain: 

, , ’iV(l-p)UD5(^w^D5U^(g)UD5ww*D5U* 
E(F) = E —^- - - — -f- 

(w*Dw) 

Therefore, 

(d-5U^®D-5U*) E(F) ( 

(w)w’^(8)ww^ 


UD 2(g)UD 2 


i=l 


<Sg-lx,)' 


+Op(|| A| 


Let F be the x iV^ matrix given by: 


F = 


iV(l-p) 


E 

2=1 


vec (Eg ^x^x*So 


■)(xns„-*) 




= fV(l-p)E 
= iV(l-p)E 
= iV(l-p)E 




(w*Dw) 

w)(8)w) (w(8)w*) 


(w*Dw) 
vec(ww*) (vec(ww*)) 


(w*Dw) 


Then, vec(A) satisfies the following system of equations: 
vec(A) = vec ^^Sg ^ SSg ^ —Iat^ +E (F) vec(A) 


= iV(l-p)B(D), 

where B(D) is provided by Lemma|^ A closed-form expres¬ 
sion for F = E(F) is thus given by: 

-h(F-E(F))5-f0,(11^11). (19) F = ^(1-P) (uDi^UDi) B(D) (d^U^^D^U*) . 

Given that the two last terms in the right-hand side of ( [T^ The linear system of equations in ( |20) i thus becomes: 
converges to zero at a rate faster than we have: 

i/nvec(A) = \/n v^E(F)vec( 

+ Op(l). 


A) 

( 20 ) 


V^vec(A) = v^(Iw-F)-i (^(Sg ^)ESo ^+Op(l). 
Writing vec(A) = ^ ^Sg (g)Sg ^ j we hnally 


obtain: 


It remains thus to compute E(F) and to check that its spectral 
norm is less than 1. We will start by controlling the spectral 
norm of E(F). Recall that E(F) is given by: 


= (^(Sg^)ESg= j (IaT^-F)-! (^(Sg =)ESg 

+ Op(l). 


•s/n^ 


E(F) = IV(l-p) 

( 


1 1 


vec I Sg ^XX*Sg 


xE 


= lV(l-p)E 


= N{l-p)E 


(x-(Sg^) 


'^'^X*Sn ^ 


(x*Sg-lx)2 
((Sg-^)^X0Sg-^x) (x^(Sg-^) 


'^'^X*Sn ^ 


0 


(x*Sg-x)^ 


((Sg-^)W(Sg ")^)®(Sg ^XX^Sg 


(x*Sg-lx) 


It can be easily noticed that: —!_ ^ Ijy. There- 

x*Sj, X 

fore. 


Thus, y/nS behaves as a zero-mean Gaussian distributed vector 
with covariance: 

Ml = (^(s|)Es|^ (I^2-F)-i (^(Sg ^)ESo Ml 

X (^(Sg-^)ESg-^) (Iw 2-F)-1 (^(s|)ESg^) 

and pseudo-covariance: 

M2 = (^(s|)Es|^ (Ijv 2 -F)-' (^(Sg ")ESo M2 

X (^Sg-5®(Sg-^)') (1^2 -n-1 (^s|®(s|)'^ 

This completes the proof. 


E(F) ^ N{1-p)1n<^E 


Sq ^XX*Sg ^ 
x*S];'^x 


= IaT® (Iat— pSg 


thus implying 

||E(F)|| < ||liv-pSgi|| <1. 
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